Distributed Machine Learning via Sufficient Factor Broadcasting

نویسندگان

Pengtao Xie

Jin Kyu Kim

Yi Zhou

Qirong Ho

Abhimanu Kumar

Yaoliang Yu

Eric P. Xing

چکیده

Matrix-parametrized models, including multiclass logistic regression and sparse coding, are used in machine learning (ML) applications ranging from computer vision to computational biology. When these models are applied to large-scale ML problems starting at millions of samples and tens of thousands of classes, their parameter matrix can grow at an unexpected rate, resulting in high parameter synchronization costs that greatly slow down distributed learning. To address this issue, we propose a Sufficient Factor Broadcasting (SFB) computation model for efficient distributed learning of a large family of matrix-parameterized models, which share the following property: the parameter update computed on each data sample is a rank-1 matrix, i.e. the outer product of two “sufficient factors” (SFs). By broadcasting the SFs among worker machines and reconstructing the update matrices locally at each worker, SFB improves communication efficiency — communication costs are linear in the parameter matrix’s dimensions, rather than quadratic — without affecting computational correctness. We present a theoretical convergence analysis of SFB, and empirically corroborate its efficiency on four different matrix-parametrized ML models.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lighter-Communication Distributed Machine Learning via Sufficient Factor Broadcasting

Matrix-parametrized models (MPMs) are widely used in machine learning (ML) applications. In large-scale ML problems, the parameter matrix of a MPM can grow at an unexpected rate, resulting in high communication and parameter synchronization costs. To address this issue, we offer two contributions: first, we develop a computation model for a large family of MPMs, which share the following proper...

متن کامل

Large Scale Distributed Multiclass Logistic Regression

Multiclass logistic regression (MLR) is a fundamental machine learning model to do multiclass classification. However, it is very challenging to perform MLR on large scale data where the feature dimension is high, the number of classes is large and the number of data samples is numerous. In this paper, we build a distributed framework to support large scale multiclass logistic regression. Using...

متن کامل

Global Warming: New Frontier of Research Deep Learning- Age of Distributed Green Smart Microgrid

The exponential increase in carbon-dioxide resulting Global Warming would make the planet earth to become inhabitable in many parts of the world with ensuing mass starvation. The rise of digital technology all over the world fundamentally have changed the lives of humans. The emerging technology of the Internet of Things, IoT, machine learning, data mining, biotechnology, biometric, and deep le...

متن کامل

Learning Support Vector Machines from Distributed Data Sources

In this paper we address the problem of learning Support Vector Machine (SVM) classifiers from distributed data sources. We identify sufficient statistics for learning SVMs and present an algorithm that learns SVMs from distributed data by iteratively computing the set of sufficient statistics. We prove that our algorithm is exact with respect to its centralized counterpart and efficient in ter...

متن کامل

Conditions for Convergence in Regularized Machine Learning Objectives

Analysis of the convergence rates of modern convex optimization algorithms can be achived through binary means: analysis of emperical convergence, or analysis of theoretical convergence. These two pathways of capturing information diverge in efficacy when moving to the world of distributed computing, due to the introduction of non-intuitive, non-linear slowdowns associated with broadcasting, an...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1511.08486 شماره

صفحات -

تاریخ انتشار 2014

Distributed Machine Learning via Sufficient Factor Broadcasting

نویسندگان

چکیده

منابع مشابه

Lighter-Communication Distributed Machine Learning via Sufficient Factor Broadcasting

Large Scale Distributed Multiclass Logistic Regression

Global Warming: New Frontier of Research Deep Learning- Age of Distributed Green Smart Microgrid

Learning Support Vector Machines from Distributed Data Sources

Conditions for Convergence in Regularized Machine Learning Objectives

عنوان ژورنال:

اشتراک گذاری